Structural Bayesian language modeling and adaptation

نویسندگان

  • Sibel Yaman
  • Jen-Tzung Chien
  • Chin-Hui Lee
چکیده

We propose a language modeling and adaptation framework using Bayesian structural maximum a posteriori (SMAP) principle, in which each n-gram event is embedded in a branch of a tree structure. The nodes in the first layer of this tree structure represent the unigrams, and those in the second layer represent the bigrams, and so on. Each node in the tree structure has an associated hyper-parameter representing the information about the prior distribution, and a count representing the number of times the word sequence occurs in the domain-specific data. In general, the hyper-parameters depend on the observation frequency of not only the node event but also its parent node of lower order n-gram event. Our automatic speech recognition experiments using the Wall Street Journal corpus verify that the proposed SMAP language model adaptation achieves a 5.6% relative improvement over maximum likelihood language models obtained with the same training and adaptation data sets.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Hierarchical Bayesian Approach for Semi-supervised Discriminative Language Modeling

Discriminative language modeling provides a mechanism for differentiating between competing word hypotheses, which are usually ignored in traditional maximum likelihood estimation of N-gram language models. Discriminative language modeling usually requires manual transcription which can be costly and slow to obtain. On the other hand, there are vast amount of untranscribed speech data on which ...

متن کامل

Modeling Structural Relationships Between Epistemological Beliefs and Mediating Learning Strategies on Anxiety in English Students

Introduction :The purpose of this study was to investigate the modeling of modeling structural relationships between epistemological beliefs and mediating learning strategies on the English language anxiety of third-year high school girl students in Babol. Methods:Correlation research was based on structural equation modeling. The statistical population consisted of 3rd grade high school girl s...

متن کامل

Language Proficiency and Identity: Developing a Structural Equation Modeling (SEM) of Identity for Iranian EFL Learners

This study was an endeavor to develop a model of identity among Iranian EFL learners. To achieve this end, a multiphase design was implemented. Initially, it attempted to investigate different factors of identity to propose and validate a model. Thus, 120 EFL learners studying in different English language institutes in Iran were randomly selected, and 36 learners were interviewed about their v...

متن کامل

A Model of Iranian EFL Learners\' Cultural Identity: A Structural Equation Modeling Approach

This study aimed, firstly, to investigate the underlying components of Iranian cultural identity and, secondly, to confirm the aforementioned components via Structural Equation Modeling (SEM) analysis. In order to achieve these goals, the researchers reviewed the extensive local and international literature on language, culture and identity. Based on the literature and consultations with a grou...

متن کامل

Optimal on-line Bayesian model selection for speaker adaptation

In this paper, we show how to accomodate a Bayesian variant of Rissanen’s MDL into on-line Bayesian adaptation to control both model structural complexity and parameterization complexity to best fit an available amount of adaptation data, the goal being minimization of resulting recognition error. An efficient bottom-up dynamic programming based pruning algorithm is developed for selecting mode...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007